Private ethernet overlay networks over a shared ethernet in a virtual environment

ABSTRACT

A system for private networking within a virtual infrastructure is presented. The system includes a virtual machine (VM) in a first host, the VM being associated with a first virtual network interface card (VNIC), a second VM in a second host, the second VM being associated with a second VNIC, the first and second VNICs being members of a fenced group of computers that have exclusive direct access to a private virtual network, wherein VNICs outside the fenced group do not have direct access to packets on the private virtual network, a filter in the first host that encapsulates a packet sent on the private virtual network from the first VNIC, the encapsulation adding to the packet a new header and a fence identifier for the fenced group, and a second filter in the second host that de-encapsulates the packet to extract the new header and the fence identifier.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Divisional of U.S. patent application Ser. No.12/819,438 filed Jun. 21, 2010, issued as U.S. Pat. No. 8,892,706, whichis hereby incorporated by reference.

This application is related by subject matter to U.S. patent applicationSer. No. 12/510,072, filed Jul. 27, 2009, and entitled “AUTOMATEDNETWORK CONFIGURATION OF VIRTUAL MACHINES IN A VIRTUAL LAB ENVIRONMENT”;U.S. patent application Ser. No. 12/510,135, filed Jul. 27, 2009, andentitled “MANAGEMENT AND IMPLEMENTATION OF ENCLOSED LOCAL NETWORKS IN AVIRTUAL LAB”; and U.S. patent application Ser. No. 12/571,224, filedSep. 30, 2009, and entitled “PRIVATE ALLOCATED NETWORKS OVER SHAREDCOMMUNICATIONS INFRASTRUCTURE”; U.S. patent application Ser. No.11/381,119, filed May 1, 2006, and entitled “VIRTUAL NETWORK IN SERVERFARM”, all of which are incorporated herein by reference.

1. FIELD OF THE INVENTION

The present invention relates to methods, systems, and computer programsfor deploying fenced groups of Virtual Machines (VMs) in a virtualinfrastructure, and more particularly, to methods, systems, and computerprograms for private networking among fenced groups of VMs executing inmultiple hosts of the virtual infrastructure.

2. DESCRIPTION OF THE RELATED ART

Virtualization of computer resources generally involves abstractingcomputer hardware, which essentially isolates operating systems andapplications from underlying hardware. Hardware is therefore sharedamong multiple operating systems and applications wherein each operatingsystem and its corresponding applications are isolated in correspondingVMs and wherein each VM is a complete execution environment. As aresult, hardware can be more efficiently utilized.

The virtualization of computer resources sometimes requires thevirtualization of networking resources. To create a private network in avirtual infrastructure means that a set of virtual machines haveexclusive access to this private network. However, virtual machines canbe located in multiple hosts that may be connected to different physicalnetworks. Trying to impose a private network on a distributedenvironment encompassing multiple physical networks is a complexproblem. Further, sending a broadcast message in a private networkpresents two problems. First, the broadcast may be received by hostswhich do not host any VMs in the private network, thus reducing thescalability of the entire distributed system. Second, if hosts are notlocated on adjacent layer 2 networks, the broadcast may not reach allhosts with VMs in the private network.

Virtual Local Area Networks (VLAN) are sometimes used to implementdistributed networks for a set of computing resources that are notconnected to one physical network. A VLAN is a group of hosts thatcommunicate as if the group of hosts were attached to the Broadcastdomain, regardless of their physical location. A VLAN has the sameattributes as a physical Local Area Network (LAN), but the VLAN allowsfor end stations to be grouped together even if the end stations are notlocated on the same network switch. Network reconfiguration can be donethrough software instead of by physically relocating devices. Routers inVLAN topologies provide broadcast filtering, security, addresssummarization, and traffic flow management. However, VLANs only offerencapsulation and, by definition, switches may not bridge trafficbetween VLANs as it would violate the integrity of the VLAN broadcastdomain. Further, VLANs are not easily programmable by a centralizedvirtual infrastructure manager.

Virtual labs, such as VMware's vCenter Lab Manager™ from the assignee ofthe present patent application, enable application development and testteams to create and deploy complex multi-tier system and networkconfigurations on demand quickly. Testing engineers can set up, capture,and reset virtual machine configurations for demonstration environmentsin seconds. In addition, hands-on labs can be quickly configured anddeployed for lab testing, hands-on training classes, etc.

The creation of virtual lab environments requires flexible tools toassist in the creation and management of computer networks. For example,if a test engineer decides to perform different tests simultaneously onone sample environment, the test engineer must deploy multiple times thesample environment. The multiple deployments must coexist in the virtualinfrastructure. However, these environments often have networkconfigurations that when deployed multiple times would cause networkingrouting problems, such as the creation of VMs with duplicate InternetProtocol (IP) addresses—an impermissible network scenario for the properoperation of the VMs and of the virtual lab environments.

Existing solutions required that VMs within the same private environmentbe executed on the same host using virtual switches in the host.However, the single-host implementation has drawbacks, such as a maximumnumber of VMs that can be deployed on a single host, inability to moveVMs to different hosts for load balancing, unexpected host shutdowns,etc.

SUMMARY

Methods, systems, and computer programs for implementing privatenetworking within a virtual infrastructure are presented. It should beappreciated that the present invention can be implemented in numerousways, such as a process, an apparatus, a system, a device or a method ona computer readable medium. Several inventive embodiments of the presentinvention are described below.

In one embodiment, a method includes an operation for sending a packeton a private virtual network from a first virtual machine (VM) in afirst host to a second VM. The first and second VMs are members of afenced group of computers that have exclusive direct access to theprivate virtual network, where VMs outside the fenced group do not havedirect access to the packets that travel on the private virtual network.Further, the method includes encapsulating the packet at the first hostto include a new header as well as a fence identifier for the fencedgroup. The packet is received at a host where the second VM is executingand the packet is de-encapsulated to extract the new header and thefence identifier. Additionally, the method includes an operation fordelivering the de-encapsulated packet to the second VM after validatingthat the destination address in the packet and the fence identifiercorrespond to the destination address and the fence identifier,respectively, of the second VM.

In another embodiment, a computer program embedded in a non-transitorycomputer-readable storage medium, when executed by one or moreprocessors, for implementing private networking within a virtualinfrastructure, includes program instructions for sending a packet on aprivate virtual network from a first VM in a first host to a second VM.The first and second VMs are members of a fenced group of computers thathave exclusive direct access to the private virtual network, where VMsoutside the fenced group do not have direct access to packets on theprivate virtual network. Further, the computer program includes programinstructions for encapsulating the packet at the first host to include anew header and a fence identifier for the fenced group, and forreceiving the packet at a host where the second VM is executing. Furtheryet, the computer includes program instructions for de-encapsulating thepacket to extract the new header and the fence identifier, and programinstructions for delivering the de-encapsulated packet to the second VMafter validating that a destination address in the packet and the fenceidentifier correspond to the second VM.

In yet another embodiment, a system for private networking within avirtual infrastructure includes a first VM and a first filter in a firsthost, in addition to a second VM and a second filter in a second host.The first and second VMs are members of a fenced group of computers thathave exclusive direct access to a private virtual network, where VMsoutside the fenced group do not have direct access to packets on theprivate virtual network. The first filter encapsulates a packet sent ona private virtual network from the first VM, by adding to the packet anew header and a fence identifier for the fenced group. The secondfilter de-encapsulates the packet to extract the new header and thefence identifier, and the second filter delivers the de-encapsulatedpacket to the second VM after validating that a destination address inthe packet and the fence identifier correspond to the second VM.

Other aspects of the invention will become apparent from the followingdetailed description, taken in conjunction with the accompanyingdrawings, illustrating by way of example the principles of theinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may best be understood by reference to the followingdescription taken in conjunction with the accompanying drawings inwhich:

FIG. 1 includes an architectural diagram of an embodiment of a virtualinfrastructure system.

FIG. 2 depicts one embodiment of the host architecture for instantiatingVirtual Machines (VM) with multiple Virtual Network Interface Cards(VNIC).

FIG. 3 illustrates the deployment of multiple VM configurations,according to one embodiment.

FIG. 4 illustrates the use of packet filters within the host, accordingto one embodiment.

FIG. 5 shows one embodiment for Ethernet frame encapsulation.

FIG. 6 provides a detailed illustration of the encapsulated packet, inaccordance with one embodiment of the invention.

FIG. 7 illustrates the flow of a broadcast packet sent within theprivate virtual network, according to one embodiment.

FIG. 8 illustrates the flow of the response packet to the broadcast,according to one embodiment.

FIG. 9A illustrates the flow of a packet travelling between VMs in thesame host, according to one embodiment.

FIG. 9B illustrates the flow of an Internet Protocol (IP) packet,according to one embodiment.

FIG. 10 illustrates the update of bridge tables when a VM migrates to adifferent host, according to one embodiment.

FIG. 11 shows the structure of a Maximum Transmission Unit (MTU)configuration table, according to one embodiment.

FIG. 12 shows one embodiment of an active-ports table.

FIG. 13 shows an embodiment of a bridge table.

FIG. 14 shows the process flow of a method for private networking withina virtual infrastructure, in accordance with one embodiment of theinvention.

DETAILED DESCRIPTION

The following embodiments describe methods and apparatus forimplementing private networking within a virtual infrastructure.Embodiments of the invention use Media Access Control (MAC)encapsulation of Ethernet packets. The hosts that include VirtualMachines (VM) from fenced groups of machines implement distributedswitching with learning for unicast delivery. As a result, VMs areallowed to migrate to other hosts to enable resource management and HighAvailability (HA). Further, the private network implementation istransparent to the guest operating system (GOS) in the VMs and providesan added level of privacy.

With a host-spanning private network (HSPN), VMs can be placed on anyhost where the private network is implemented. The HSPN may span hostsin a cluster or clusters in a datacenter, allowing large groups of VMsto communicate over the private network. Additionally, VMs may movebetween hosts since VMs maintain private network connectivity. A VM canalso be powered-on in a different host after failover and still retainnetwork connectivity. Further, VMs get their own isolated private level2 connectivity without the need to obtain Virtual Local Area Networks(VLAN) ID resources or even setup VLANs. Creating a HSPN is thereforesimpler because there is no dependency on the network administrator. TheHSPN can be deployed on either a VLAN or an Ethernet segment.

It should be appreciated that some embodiments of the invention aredescribed below using Ethernet, Internet Protocol (IP), and TransmissionControl Protocol (TCP) protocols. Other embodiments may utilizedifferent protocols, such as an Open Systems Interconnection (OSI)network stack, and the same principles described herein apply. Theembodiments described below should therefore not be interpreted to beexclusive or limiting, but rather exemplary or illustrative.

It will be obvious, however, to one skilled in the art, that the presentinvention may be practiced without some or all of these specificdetails. In other instances, well known process operations have not beendescribed in detail in order not to unnecessarily obscure the presentinvention.

FIG. 1 includes an architectural diagram of an embodiment of a virtualinfrastructure system. Virtual infrastructure 102 includes one or morevirtual infrastructure servers 104 that manage a plurality of hosts 106.Virtual machines 108 are instantiated in hosts 106, and the multiplehosts share a plurality of resources within the virtual infrastructure,such as shared storage 110. A configuration is a core element of avirtual lab and is composed of virtual machines and virtual labnetworks. Virtual lab users can group, deploy, save, share, and monitormulti-machine configurations. Configurations reside in the library or inuser workspaces, in which case they are referred to as workspaceconfigurations.

Many applications run on more than one machine and grouping machines inone configuration is more convenient to manage the applications. Forexample, in a classic client-server application, the database server mayrun on one machine, the application server on another machine, and theclient on a third machine. All these machines would be configured to runwith each other. Other servers may execute related applications, such asLDAP servers, Domain Name servers, domain controllers, etc. Virtual labserver allows the grouping of these dependent machines into aConfiguration, which can be checked in and out of the library. When aconfiguration is checked out, all the dependent machines configured towork with each other are activated at the same time. Libraryconfigurations can also store the running state of machines so thedeployment of machines that are already running is faster.

Virtual lab networks, also referred to herein as enclosed localnetworks, can be categorized as private networks and shared networks.Private networks in a configuration are those networks availableexclusively to VMs in the configuration, that is, only VMs in theconfiguration can have a Network Interface Controller (NIC) or VNICconnected directly to a switch or virtual switch (VSwitch) for theprivate network. Access to data on a private network is restricted tomembers of the configuration, that is, the private network is isolatedfrom other entities outside the configuration. In one embodiment, aprivate network in the configuration can be connected to a physicalnetwork to provide external connectivity to the VMs in the privatenetwork. Private networks in a configuration are also referred to hereinas Configuration Local Networks (CLN) or virtual networks. Sharednetworks, also referred to herein as shared physical networks orphysical networks, are available to all VMs in the virtualinfrastructure, which means that a configuration including a sharednetwork will enable VMs in the shared network to communicate with otherVMs in the virtual infrastructure connected, directly or indirectly, tothe shared network. In one embodiment, a shared network is part of aVirtual Local Area Network (VLAN).

Deploying a configuration causes the VMs and networks in theconfiguration to be instantiated in the virtual infrastructure.Instantiating the VMs includes registering the VMs in the virtualinfrastructure and powering-on the VMs. When an individual VM from aconfiguration is deployed, virtual lab deploys all shared networks andCLNs associated with the configuration using the network connectivityoptions in the configuration. Undeploying a configurationde-instantiates the VMs in the configuration from the virtualinfrastructure. De-instantiating VMs includes powering off or suspendingthe VMs and un-registering the VMs from the virtual infrastructure. Thestate of the deployment can be saved in storage or discarded. Saving thememory state helps debugging memory-specific issues and makes VMs in theconfiguration ready for deployment and use almost instantly.

Virtual lab server 112 manages and deploys virtual machineconfigurations in a collection of hosts 106. It should be appreciatedthat not all hosts 106 need to be part of the scope of virtual labserver 112, although in one embodiment, all the hosts are within thescope of virtual lab server 112. Virtual lab server 112 manages hosts106 by communicating with virtual infrastructure server 104, and byusing virtual lab server agents installed on those hosts. In oneembodiment, virtual lab server 112 communicates with virtualinfrastructure server 104 via an Application Programming Interface(API), for example, to request the instantiation of VMs and networks.

Although virtual lab server 112 is used to perform some management taskson hosts 106, the continuous presence of virtual lab server 112 is notrequired for the normal operation of deployed VMs, which can continue torun even if virtual lab server 112 becomes unreachable, for examplebecause a network failure. One or more users 116 interface with virtuallab server 112 and virtual infrastructure 102 via a computer interface,which in one embodiment is performed via web browser.

FIG. 2 depicts one embodiment of the host architecture for instantiatingVMs with multiple Virtual Network Interface Cards (VNIC). In oneembodiment, VMkernel 204, also referred to as virtual infrastructurelayer, manages the assignment of VMs 206 in host 202. VM 206 includesGuest Operating System (GOS) 208 and multiple VNICs 210. Each VNIC 210is connected to a VSwitch 212 that provides network switch functionalityfor the corresponding virtual network interfaces. VSwitches 212 areconnected to a physical NIC device in the host to provide access tophysical network 216. Each of the VNICs and VSwitches are independent,thus a VM can connect to several virtual networks via several VNICs thatconnect to one or more physical NIC devices 214. In another embodiment,each VSwitch 212 is connected to a different physical NIC device, thuseach VSwitch 212 provides connectivity to a different physical network.In the sample configuration illustrated in FIG. 2 , VSwitch 212 providesswitching for virtual networks “Network 1” (VNIC1) and “Network 4”(VNIC4). VSwitch 212 assigns a set of ports to “Network 1” and adifferent set of ports to “Network 4,” where each set of ports supportsMedia Access Control (MAC) addressing for the corresponding virtualnetwork. Thus, packets from “Network 1” coexist with packets from“Network 4” on the same transmission media.

The virtual computer system supports VM 206. As in conventional computersystems, both system hardware 220 and system software are included. Thesystem hardware 220 includes one or more processors (CPUs) 222, whichmay be a single processor, or two or more cooperating processors in aknown multiprocessor arrangement. The system hardware also includessystem memory 226, one or more disks 228, and some form of MemoryManagement Unit (MMU) 224. The system memory is typically some form ofhigh-speed RAM (random access memory), whereas the disk is typically anon-volatile, mass storage device. As is well understood in the field ofcomputer engineering, the system hardware also includes, or is connectedto, conventional registers, interrupt handling circuitry, a clock, etc.,which, for the sake of simplicity, are not shown in the figure.

The system software includes VMKernel 204, which has drivers forcontrolling and communicating with various devices 230, NICs 214, anddisk 228. In VM 206, the physical system components of a “real” computerare emulated in software, that is, they are virtualized. Thus, VM 206will typically include virtualized guest OS 208 and virtualized systemhardware (not shown), which in turn includes one or more virtual CPUs,virtual system memory, one or more virtual disks, one or more virtualdevices, etc., all of which are implemented in software to emulate thecorresponding components of an actual computer.

The guest OS 208 may, but need not, simply be a copy of a conventional,commodity OS. The interface between VM 103 and the underlying hosthardware 220 is responsible for executing VM related instructions andfor transferring data to and from the actual physical memory 226, theprocessor(s) 222, the disk(s) 228 and other devices.

FIG. 3 illustrates the deployment of multiple VM configurations,according to one embodiment. Configuration 302, which includes VMs A, B,and C, is deployed a first time resulting in deployment 304. When aconfiguration of machines is copied, the system performs the copying,also referred to as cloning, in a short amount of time, taking afraction of the disk space a normal copy would take. This is referred toas linked clones. For example, when a virtual lab server VM with an 80GB disk, is cloned, the 80 GB are not copied. Instead a 16 MB filecalled a linked clone is created, which points to the 80 GB disk andacts like a new instance of the disk.

Another feature of virtual lab server is the ability to use multiplecopies of VMs simultaneously, without modifying them. When machines arecopied using traditional techniques, the original and the copy cannot beused simultaneously due to duplicate IP addresses, MAC addresses, andsecurity IDs (in the case of Windows). Virtual lab server provides anetworking technology called “fencing” that allows multiple unchangedcopies of virtual lab server VMs to be run simultaneously on the samenetwork without conflict, while still allowing the VMs to access networkresources and be accessed remotely.

FIG. 3 illustrates the process of deploying fenced VMs. The firstdeployment 304 can use the IP and Ethernet addresses in configuration302 and be directly connected to the network without any conflicts (ofcourse assuming no other VMs in the network have the same addresses).Deployment 306 is created after cloning the first deployment 304.However, deployment 306 cannot be connected directly to the networkbecause there would be duplicate addresses on the network.

Deployment 308 is deployed in fenced mode, including private networkingmodule 310, which performs, among other things, filtering andencapsulation of network packets before sending the packets on thephysical network. This way, there is no duplication of addresses in thephysical network.

FIG. 4 illustrates the use of packet filters within host 402, accordingto one embodiment. Host 402 includes several VMs. Each VM is associatedwith a VNIC 404. In general, it is assumed that each VM only has onenetwork connection, which means one VNIC. However, it is possible for aVM to have multiple network connections, where each connection isassociated with a different VNIC. Principles of the invention can alsobe applied when a VM has multiple VNICs, because the importantconsideration is that each VNIC be associated with one layer 2 and onelayer 3 address. Therefore, it would be more precise to refer to VNICsinstead of VMs, but for ease of description VMs are used in thedescription of some embodiments. However, it is understood that if a VMhas more than one VNIC, then each VNIC would be separately consideredand belonging to a separate private network.

A Distributed Virtual (DV) Filter 408 is associated with VNIC 404 andperforms filtering and encapsulation of packets originating in VNIC 404before the packets reach distributed vSwitch 410. On the receiving side,DV Filter 408 performs filtering and de-encapsulation (stripping) whenneeded. Distributed vSwitch 410 is connected to one or more physicalNICs (PNIC) 412 that connect host 402 to physical network 414.

The use of a DV Filter enables the implementation of the cross-hostprivate virtual network. The DV filter is compatible with VLAN and otheroverlays solutions as the encapsulation performed by DV Filter 408 istransparent to switches and routers on the network. More details on theoperation of DV Filter 408 are given below in reference to FIGS. 7-13 .

FIG. 5 shows one embodiment for Ethernet frame encapsulation. A packetsent from a VM in the private virtual network includes original Ethernetheader 506 and original payload 508. The DV Filter adds a new header andnew data to the original packet. The encapsulation Ethernet header 502has the standard fields of an Ethernet header. More details on thecomposition of encapsulation Ethernet header 502 are given below inreference to FIG. 6 . DV Filter also adds fence protocol data 504 to thedata field of the new Ethernet packet in front of the original packet.In other words, the payload for the new packet includes fence protocoldata 504, original Ethernet header 506, and original payload 508.

Since the new encapsulated packet includes additional bytes, it ispossible that the resulting encapsulated packet exceeds the MaximumTransmission Unit (MTU) of layer 2. In this case fragmentation isrequired. The encapsulated packet is fragmented in 2 different packets,transmitted separately, and the DV filter at the receiving hostde-encapsulates the two packets by combining their contents to recreatethe encapsulated packet. In one embodiment, fragmentation is avoided byincreasing the uplink MTU. In another embodiment, the VM is configuredby the user with an MTU that is smaller from the MTU on the network,such that encapsulation can be performed on all packets withoutfragmentation.

Because of the MAC-in-MAC Ethernet frame encapsulation the traffic ofthe private virtual network is isolated from other traffic, in the sensethat the Ethernet headers of the private network packets are “hidden”from view. Also, the private network packets terminate in hosts thatimplement the private networking, allowing an additional level ofcontrol and security. Switches and routers on the network do not see orhave to deal with this encapsulation because they only see a standardEthernet header, which is processed the same as any standard Ethernetheader. As a result, no network infrastructure or additional resourcesare required to implement private networking, there no MAC addressingcollisions, and VLANs are interoperable with the private virtual networkscheme. Also, a large number of private networks is possible (i.e. 16million or more) per VLAN.

FIG. 6 provides a detailed illustration of the encapsulated packet, inaccordance with one embodiment of the invention. As with any standardEthernet header, the encapsulating header includes a destinationaddress, a source address and a time to live (T/L) field. The source anddestination address are form by joining together a fenceOrganizationally-Unique-Identifier (OUI) (24 bits), an installationidentifier (“Install ID”) (8 bits), and a host identifier (16 bits). AnOUI is a 24-bit number that is purchased from the Institute ofElectrical and Electronics Engineers, Incorporated (IEEE) RegistrationAuthority. This identifier uniquely identifies a vendor, manufacturer,or other organization globally and effectively reserves a block of eachpossible type of derivative identifier (such as MAC addresses, groupaddresses, Subnetwork Access Protocol protocol identifiers, etc.) forthe exclusive use of the assignee.

The fence OUI is a dedicated OUI reserved for private virtualnetworking. Therefore, there will not be address collisions on thenetwork because nodes that are not part of the private networking schemewill not use the reserved fence OUI. The destination address in theencapsulating header can also be a broadcast address, and all the hostsin the network will receive this packet.

The virtual lab server installation ID is unique on a LAN segment and ismanaged by virtual lab server 112 (FIG. 1 ). The fence identifieruniquely identifies a private network within the virtual lab server.Fence IDs can be recycled over time. Further, the T/L field in theencapsulating header includes the fence Ethernet type which is an IEEEassigned number (in this case assigned to VMware, the assignee of thepresent application) that identifies the protocol carried by theEthernet frame. More specifically, the protocol identified is the Fenceprotocol, i.e., the protocol to perform MAC-in-MAC framing. The Ethernettype is used to distinguish one protocol from another.

The fence protocol data includes a version ID of the private networkimplementation or protocol (2 bits), a fragment type (2 bits), afragment sequence number, and a fence identifier. The fragment type andsequence number indicate if the original packet has been fragmented, andif so, which fragment number corresponds to the packet. The fenceidentifier indicates a value assigned to the private virtual network. Inone embodiment, this field is 24 bits which allows for more than 16million different private networks per real LAN.

It should be appreciated that the embodiments illustrated in FIGS. 5 and6 are exemplary data fields for encapsulating network packets. Otherembodiments may utilize different fields, or may arrange the data invarying manners. The embodiments illustrated in FIGS. 5 and 6 shouldtherefore not be interpreted to be exclusive or limiting, but ratherexemplary or illustrative.

FIG. 7 illustrates the flow of a broadcast packet sent within theprivate virtual network, according to one embodiment. FIG. 7 illustratessome of the events that take place after VM A 720 is initialized. Duringnetwork initialization, VM A 720 sends a broadcast Address ResolutionProtocol (ARP) packet 702 to be delivered to all nodes that have layer-2connectivity on the private network associated with Fence 1. Fence 1includes VM A 720 in Host 1 714 and VM B 742 in Host 2 716. It should benoted that VM B 744, also in Host 2 716, is a clone of VM B 742 and isconnected to a different private network from the one connected to VM A720 and VM B 742.

Packet 702 is a standard ARP broadcast packet including VM A's addressas the source address. VM A 720 sends the message through port 722,which is associated with Fence 1. DV Filter 724 receives packet 704,associated with Fence 1, and adds the encapsulating header, as describedabove in reference to FIGS. 5 and 6 , to create encapsulated packet 706.The destination address of the encapsulating header is also an Ethernetbroadcast address. DV Filter 724 sends packet 706 to distributed vSwitch726 for transmittal over the network via physical NIC 728.

Host 2 716 receives packet 706 (referred to as packet 708) and theDistributed vSwitch forwards packet 708 to the DV Filters for all VNICS,since it is a broadcast packet. DV Filter 734 associated with VM B742examines the source address. It determines that packet 708 is a privatevirtual network packet because of the unique fence OUI. This packetcomes from Host 1 because the source address includes Host 1's ID and itis originally from VM A because VM A's Ethernet address is in theoriginal Ethernet header. Since DV Filter 734 did not have an entry forVM A in that private network, an entry is added to bridge table 746mapping VM A with Host 1. More details on the structure of bridge table746 are given below in reference to FIG. 13 .

DV Filter 734 de-encapsulates the packet by stripping the encapsulatingheaders and added data to create packet 710, which is associated withFence 1 as indicated in the Fence ID of the fence protocol data. DVFilter then checks for ports associated with Fence 1 and the destinationaddress of packet 710, which is every node since it is a broadcastaddress. Since VM 742 is associated with Fence 1 738, packet 712 isdelivered to VM B 742. On the other hand, VM B 744 will not get deliveryof the packet or frame because the DV Filter for VM B 744 (not shown)will detect that the frame is for Fence 1 nodes and will drop the framebecause VM B 744 does not belong to Fence 1. It belongs to Fence 2.

It should be noted that this mechanism provides an added level ofsecurity by assuring that the fence is isolated. Packets that have noFence ID will be dropped and will never make it inside the fence.

FIG. 8 illustrates the flow of the response packet to the broadcast,according to one embodiment. VM B 742 replies to packet 712 with packet802 addressed to VM A 720. DV Filter 734 receives packet 804, associatedwith Fence 1 because VM B's port is associated with Fence 1. DV Filter734 checks bridge table 746 and finds an entry for VM A indicating thatVM A is executing in Host 1. DV Filter proceeds to create new Ethernetpacket 806 by encapsulating packet 804. The addresses in theencapsulation header are created according to the process described inreference to FIG. 6 . For example, the destination Ethernet address isconstructed by combining the fence OUI (24 bits), the installationidentifier (8 bits), and the number associated with Host 1 (16 bits).The fence ID for Fence 1 is added after the header and before theoriginal packet, as previously described.

After packet 806 is unicast via the physical network, Host 1 receivespacket 808, which is processed in similar fashion as described inreference to FIG. 7 , except that the destination address is not abroadcast address. DV Filter 724 determines that the packet is from VM Bin Fence 1. Since there is not an entry for VM B in bridge table 814, anew entry for VM B is added to bridge table 814 indicating that VM B isexecuting in Host 2. Additionally, DV Filter 724 proceeds to strippacket 808 to restore original packet 802 sent by VM B 74, by taking outthe added header and the additional payload ahead of the originalpacket. This results in packet 810, which is associated with Fence 1because the payload in packet 808 indicates that the packet is for aFence 1 node. Since VM A's port is associated with Fence 1 722 and theEthernet destination address, packet 812 is successfully delivered to VMA 720.

FIG. 9A illustrates the flow of a packet travelling between VMs in thesame host, according to one embodiment. Packet 902 is sent from VM A 920with a destination address of VM C 928, with both VMs executing in thesame host. The process is similar as the one previously described inFIGS. 7 and 8 , except that the packet does not travel over the physicalnetwork and is “turned around” by Distributed VSwitch 926. Thus, packet902 is sent to VNIC 930, which in turn sends packet 904 to DV Filter922.

It should be noted that although packets are described herein astravelling (sent and received) among the different entities of the chainof communication, it is not necessary to actually transfer the wholepacket from one module to the next. For example, a pointer to themessage may be passed between VNIC 930 and DV filter without having toactually make a copy of the packet.

DV filter for VM A 922 checks bridge table 924 and determines that thedestination VM C is executing in Host 1. The corresponding encapsulationis performed to create packet 906 which is forwarded to distributedvSwitch 926 via output leaf 932. VSwitch 926 determines that thedestination address of packet 906 is for a VM inside the host and “turnsthe packet around” by forwarding packet 908 to the DV Filter for VM C(not shown) via input leaf 934. The DV Filter for VM C strips theheaders and, after checking the destination address and the Fence ID,delivers the packet to VM C's port in VNIC 930.

FIG. 9B illustrates the flow of an Internet Protocol (IP) packet,according to one embodiment. FIG. 9B illustrates sending an IP packetfrom VM A 970 in Host 1 to VM B 972 in Host 2. The process is similar tothe one described in FIG. 7 , except that there is not a broadcastaddress but instead a unicast address, and the bridge tables in the DVfilters already have the pertinent entries as VMs A and B have beenrunning for a period of time.

Thus, encapsulated packet 956, leaving DV Filter 974, includes sourceand destination address associated with the IDs of hosts 1 and 2,respectively. When DV Filter 976 for VM B receives packet 958, it doesnot create a new entry in the bridge table because the entry for VM Aalready exists. Packet 958 is forwarded to VM B via the distributedswitch and the VNIC port, as previously described.

It should be noted that packet 952 is an Ethernet frame and that thescenario described in FIG. 9 is for VMs that are executing in hosts withlayer 2 connectivity. If the destination VM were in a host executing ina different LAN segment (i.e., a different data link layer segment),then MAC in MAC encapsulation would not work because the packet would besent to a router in the network which may not be aware of the privatenetworking scheme for fencing and would not work properly as the IPheader is not where the router would expect it. In this case otherfencing solutions for hosts on different networks can be combined withembodiments of the inventions. Solutions for internetwork fencing aredescribed in U.S. patent application Ser. No. 12/571,224 (AttorneyDocket No. A341), filed Sep. 30, 2009, and entitled “PRIVATE ALLOCATEDNETWORKS OVER SHARED COMMUNICATIONS INFRASTRUCTURE”, which isincorporated herein by reference. Also, a VLAN network can be used toprovide layer-2 connectivity to hosts in different networks.

FIG. 10 illustrates the update of bridge tables when a VM migrates to adifferent host, according to one embodiment. When VM A 156 moves fromHost 1 150 to Host 2 152, VM A 156 sends a Reverse Address ResolutionProtocol (RARP) packet 160. RARP is a computer networking protocol usedby a host computer to request its IP address from an administrativehost, when the host computer has available its Link Layer or hardwareaddress, such as a MAC address.

Since packet 160 is a broadcast packet, packet 160 will reach all nodesin the same private network as VM A 156. When packet 166 is received byDV Filter 172 in Host 2 154, DV Filter 172 detects that message is fromVM A in Host 3. Since the bridge table entry for VM A has Host 1 as thehost for VM A, and the new packet indicates that VM A is now executingin Host 3, the entry for VM A in bridge table 174 is updated to reflectthis change. The packet is then delivered to VM B 158 because VM B ispart of the private network in Fence 1 and this is a broadcast packet.

FIG. 11 shows the structure of an MTU configuration table, according toone embodiment. The MTU configuration table is used to store the MTU foreach network. Thus, each entry (shown as columns in the table) includesa LAN identifier and a MTU for the corresponding LAN. When encapsulatinga packet that results in a packet that is bigger than the MTU for thatnetwork, then the packet has to be fragmented. Each fragment is sentseparately to the destination host with a different fragment sequencenumber. The DV filter at the destination will combine the fragments toreconstruct the original encapsulated packet.

As previously described, a way to avoid fragmentation is by reducing theMTU in the network configuration of the host. For example, if the MTU ofa network is 1,500, the network can be configured in the VM as having anMTU of 1,336, reserving 144 bits for the encapsulation by the DV Filter.

FIG. 12 shows one embodiment of an active-ports table. The active-portstable has one entry (in each row) for each active VNIC and includes anOPI field, a LAN ID, and the MTU. The OPI includes virtual lab serverparameters “installation ID” and “fence ID”. The installation IDidentifies a particular implementation of a fenced configuration, anddifferent clones will have different fence IDs. The fence ID identifiesthe fence ID associated with the VNIC. The LAN ID is an internalidentifier of the underlying network that the private network (fence)overlays. Different fences may share the underlying LAN. The MTUindicates the maximum transmission unit on the network.

FIG. 13 shows an embodiment of a bridge table. As previously described,the bridge table resides in the DV filter and is used to keep theaddress of the destination hosts where the VMs of the private networkare executing. The network is organized by VNIC, also referred to asports, each associated with the VNIC for a VM. The example shown in FIG.13 includes entries for 3 ports, 0x4100b9f869e0, 0x4100b9f86d40, and0x4100b9f86f30. Port 0x4100b9f869e0 has no entries in the bridge tableyet, and the other two ports have 4 entries. Each of these entriesincludes an inner MAC address, an outer MAC address, a “used” flag, an“age” value, and a “seen” flag.

The inner MAC address corresponds to the Ethernet of another VM in thesame private network. The outer MAC address corresponds to the Ethernetof the host that the VM is on and includes the address that would beadded in an encapsulating header to send a message to the correspondingVM. Of course, the address may be constructed as described in referenceto FIG. 6 . For example, the entry in DV filter 746 of FIG. 7 holds theinner MAC address of VM A, and the outer MAC address for Host 1. Theused flag indicates if the entry is being used, the age flag indicatesif the entry has been updated in a predetermined period of time, and theseen flag indicates if the entry has been used recently.

The tables in FIGS. 11-13 are interrelated. For example, the secondentry in active ports table of FIG. 2 is for port 0x4100b9f86d40. TheOPI is “3e,0000fb”, which means that the installation ID is 3e and thefence ID is 0000fb. In the bridge table of FIG. 13 , it can be observedthat outer MAC addresses for port 0x4100b9f86d40 have the same OUI(00:13:f5), and the same installation ID (3e). The remainder of theouter MAC address corresponds host IDs for different hosts (02:c2,02:e2, 03:02, and 02:f2).

FIG. 14 shows the process flow of a method for private networking withina virtual infrastructure, in accordance with one embodiment of theinvention. The process includes operation 1402 for sending a packet on aprivate virtual network from a first VM in a first host. The first VMand a second VM are members of a fenced group of computers that haveexclusive direct access to the private virtual network, such that VMsoutside the fenced group do not have direct access to packets on theprivate virtual network. From operation 1402, the method flows tooperation 1404 for encapsulating the packet at the first host to includea new header and a fence identifier for the fenced group. See forexample DV filter 724 of FIG. 7 .

The packet is received at a host where the second VM is executing, inoperation 1406, and the method continues in operation 1408 forde-encapsulating the packet to extract the new header and the fenceidentifier. In operation 1410, the de-encapsulated packet is deliveredto the second VM after validating that the destination address in thepacket and the fence identifier correspond to the address of the secondVM and the fence identifier of the second VM.

Embodiments of the present invention may be practiced with variouscomputer system configurations including hand-held devices,microprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers and the like. Theinvention can also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a network.

With the above embodiments in mind, it should be understood that theinvention can employ various computer-implemented operations involvingdata stored in computer systems. These operations are those requiringphysical manipulation of physical quantities. Any of the operationsdescribed herein that form part of the invention are useful machineoperations. The invention also relates to a device or an apparatus forperforming these operations. The apparatus may be specially constructedfor the required purpose, such as a special purpose computer. Whendefined as a special purpose computer, the computer can also performother processing, program execution or routines that are not part of thespecial purpose, while still being capable of operating for the specialpurpose. Alternatively, the operations may be processed by a generalpurpose computer selectively activated or configured by one or morecomputer programs stored in the computer memory, cache, or obtained overa network. When data is obtained over a network the data maybe processedby other computers on the network, e.g., a cloud of computing resources.

The invention can also be embodied as computer readable code on acomputer readable medium. The computer readable medium is any datastorage device that can store data, which can be thereafter be read by acomputer system. Examples of the computer readable medium include harddrives, network attached storage (NAS), read-only memory, random-accessmemory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes and other optical andnon-optical data storage devices. The computer readable medium caninclude computer readable tangible medium distributed over anetwork-coupled computer system so that the computer readable code isstored and executed in a distributed fashion.

Although the method operations were described in a specific order, itshould be understood that other housekeeping operations may be performedin between operations, or operations may be adjusted so that they occurat slightly different times, or may be distributed in a system whichallows the occurrence of the processing operations at various intervalsassociated with the processing, as long as the processing of the overlayoperations are performed in the desired way.

Although the foregoing invention has been described in some detail forpurposes of clarity of understanding, it will be apparent that certainchanges and modifications can be practiced within the scope of theappended claims. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

1-20. (canceled)
 21. A method of forwarding packets through a particularprivate virtual network (PVN) defined over a shared physical networkalong with a plurality of other PVNs, the method comprising: at a filterdefined on a first host computer: receiving a packet sent by a firstmachine executing on the first host computer, the packet addressed to asecond machine executing on a second host computer; encapsulating thepacket with an overlay-network encapsulation header that stores anidentifier identifying the particular PVN; and sending the encapsulatedpacket to the second machine over the shared physical network.
 22. Themethod of claim 21, wherein the filter executing on the first hostcomputer is a first filter, and a second filter executes on the secondhost computer and with the first filter forms a distributed virtualfilter that adds and removes encapsulating headers with the particularPVN identifier to allow the first and second machines to exchangepackets associated with the particular PVN.
 23. The method of claim 21,wherein the first machine is a first virtual machine (VM) executing onthe first host computer, and the second machine is a second VM executingon the second host machine.
 24. The method of claim 23, wherein each VMhas an associated virtual network interface card (VNIC), and the filteris associated with a VNIC of the first VM.
 25. The method of claim 23,wherein the filter is a module that executes on the first host computeroutside of the first VM and that processes packets sent by the first VM.26. The method of claim 21, wherein receiving the packet sent by thefirst machine comprises obtaining the packet as the packet passes alongan egress data path from the first machine to a physical networkinterface card (PNIC) of the first host computer.
 27. The method ofclaim 26, wherein the packet is obtained before the packet is processedby a software switch executing on the first host computer, sending theencapsulated packet to the second machine comprises sending the packetto the software switch for forwarding to the PNIC to send theencapsulated packet to the physical network, the physical networkforwarding the encapsulated packet to the second host computer, whichremoves the encapsulated overlay-network header, uses a destinationaddress in an original header of the packet to identify the secondmachine, and passes the packet to the second machine.
 28. The method ofclaim 21, wherein the filter comprises a bridge table that storesaddresses of destination hosts where machines of the private virtualnetwork execute.
 29. The method of claim 21 further comprising: when thesize of the encapsulated packet exceeds the maximum-transmission unit(MTU) for the network, fragmenting the packet into at least two packetsand encapsulating each of the two packets with an overlay encapsulationheader before sending the encapsulated packets over the physicalnetwork.
 30. The method of claim 29, wherein encapsulating the packetwith an overlay-network encapsulation header further comprisesencapsulating the packet with (i) a 2-bit field to indicate whether thepacket has been fragmented and (ii) a fragment sequence number thatindicates which fragment number corresponds to the packet.
 31. Anon-transitory machine readable medium storing a filter for forwardingpackets through a particular private virtual network (PVN) defined overa shared physical network along with a plurality of other PVNs, thefilter for execution by at least one hardware processing unit of a firsthost computer, the filter comprising sets of instructions for: receivinga packet sent by a first machine executing on the first host computer,the packet addressed to a second machine executing on a second hostcomputer; encapsulating the packet with an overlay-network encapsulationheader that stores an identifier identifying the particular PVN; andsending the encapsulated packet to the second machine over the sharedphysical network.
 32. The non-transitory machine readable medium ofclaim 31, the filter executing on the first host computer is a firstfilter, and a second filter executes on the second host computer andwith the first filter forms a distributed virtual filter that adds andremoves encapsulating headers with the particular PVN identifier toallow the first and second machines to exchange packets associated withthe particular PVN.
 33. The non-transitory machine readable medium ofclaim 31, wherein the first machine is a first virtual machine (VM)executing on the first host computer, and the second machine is a secondVM executing on the second host machine.
 34. The non-transitory machinereadable medium of claim 33, wherein each VM has an associated virtualnetwork interface card (VNIC), and the filter is associated with a VNICof the first VM.
 35. The non-transitory machine readable medium of claim33, wherein the filter is a program that executes on the first hostcomputer outside of the first VM and that processes packets sent by thefirst VM.
 36. The non-transitory machine readable medium of claim 31,wherein the set of instructions for receiving the packet sent by thefirst machine comprises a set of instructions for obtaining the packetas the packet passes along an egress data path from the first machine toa physical network interface card (PNIC) of the first hos computer. 37.The non-transitory machine readable medium of claim 36, wherein thepacket is obtained before the packet is processed by a software switchexecuting on the first host computer, the set of instructions forsending the encapsulated packet to the second machine comprises a set ofinstructions for sending the packet to the software switch forforwarding to the PNIC to send the encapsulated packet to the physicalnetwork, the physical network forwarding the encapsulated packet to thesecond host computer, which removes the encapsulated overlay-networkheader, uses a destination address in an original header of the packetto identify the second machine, and passes the packet to the secondmachine.
 38. The non-transitory machine readable medium of claim 31,wherein the filter uses a bridge table that stores addresses ofdestination hosts where machines of the private virtual network execute.39. The non-transitory machine readable medium of claim 31, wherein thefilter further comprises sets of instructions for: fragmenting thepacket into at least two packets when the size of the encapsulatedpacket exceeds the maximum-transmission unit (MTU) for the network, andencapsulating each of the two packets with an overlay encapsulationheader before sending the encapsulated packets over the physicalnetwork.
 40. The non-transitory machine readable medium of claim 39,wherein the set of instructions for encapsulating the packet with anoverlay-network encapsulation header further comprises a set ofinstructions for encapsulating the packet with (i) a 2-bit field toindicate whether the packet has been fragmented and (ii) a fragmentsequence number that indicates which fragment number corresponds to thepacket.